Building a Terminological Database from Heterogeneous Definitional Sources

نویسندگان

  • Smaranda Muresan
  • Samuel D. Popper
  • Peter T. Davis
  • Judith L. Klavans
چکیده

An obstacle to understanding results across heterogeneous databases is the ability to determine conceptual connections between differing terminologies. In this paper, we present the two step approach which we have used to build a terminological database in order to address this issue. First we automatically built a heterogeneous collection of terms and definitions from two types of dynamic sources: 1) glossaries automatically identified from 147 government web sites and 2) definitions extracted from 600 unstructured articles. After storing terms and their definitions, we semantically analyzed the definitions to store the terminological knowledge in a relational database. Currently the database contains 12,780 definitions of 8,431 terms. 1 Motivation for a Terminological Database Due in part to the rapid growth of the Internet, individuals and researchers have unprecedented access to data from many sources. Failure to properly understand the concepts behind these data sets may lead to erroneous conclusions in data analysis. We have built an automated system that can discover and make available concepts, definitions and inter-definitional relationships from several Internet sources by building a terminological database. In building this database, we reflected two desiderata for terminological resources: 1) to enable representation of the ongoing evolution of language since new words and senses continually appear in language, and 2) to allow users to explore the richness of terminological knowledge (e.g how terms relate to each other, what are their semantic properties and attributes). Our goals are to provide easy and efficient access for users of multiple information sources across government sites. Towards meeting these desiderata, as a first step, we automatically built a heterogeneous collection of terms and definitions from dynamic sources. As a second step , we semantically analyzed these definitions in order to identify relations among terms and their attributes. We then built a relational database. The architecture of the system is shown in Figure 1. Terminological database ParseGloss

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building a Federated Relational Database System: An Approach Using a Knowledge-Based System

Due to the emerging interest in integrating di erent application environments, there have been many recent proposals for federated systems. In this paper, a federated system that permits the integration of heterogeneous relational databases using a terminological knowledge representation system is presented. In particular, two of the system's components: the translator and the integrator are ex...

متن کامل

Knowledge Representation and Reasoning with Definitional Taxonomies

We provide a detailed overview of knowledge representation issues in general and terminological knowledge representation in particular. Terminological knowledge representation, which originated with KL-ONE, is an object-centered approach in the tradition of semantic networks and frames. Terminological systems share three distinguishing characteristics: (1) They are intended to support the defin...

متن کامل

A Large-Scale Resource for Storing and Recognizing Technical Terminology

This paper discusses the design and implementation of Termino, a large-scale terminological resource for text processing. Dealing with terminology is a difficult but unavoidable task for natural language processing applications, such as information extraction in technical domains. Complex, heterogeneous information must be stored about large numbers of terms. At the same time term recognition m...

متن کامل

A Large Scale Terminology Resource For Biomedical Text Processing

In this paper we discuss the design, implementation, and use of Termino, a large scale terminological resource for text processing. Dealing with terminology is a difficult but unavoidable task for language processing applications, such as Information Extraction in technical domains. Complex, heterogeneous information must be stored about large numbers of terms. At the same time term recognition...

متن کامل

ECODE: A Definition Extraction System

Terminological work aims to identify knowledge about terms in specialised texts in order to compile dictionaries, glossaries or ontologies. Searching for definitions about the terms that terminographers intend to define is therefore an essential task. This search can be done in specialised corpus, where they usually appear in definitional contexts, i.e. text fragments where an author explicitly...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003